Search CORE

14 research outputs found

Generative Fractional Diffusion Models

Author: Aversa Marco
Detzel Michael
Ermon Stefano
Knochenhauer Christoph
Lapuschkin Sebastian
Murray-Smith Roderick
Nakajima Shinichi
Nobis Gabriel
Oala Luis
Samek Wojciech
Springenberg Maximilian
Publication venue
Publication date: 26/10/2023
Field of study

We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index

H\in(0,1)

of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation

arXiv.org e-Print Archive

Data Models for Dataset Drift Controls in Machine Learning With Images

Author: Aversa Marco
Buck Michèle
Clausen Christoph
Extermann Jerome
Matek Christian
Murray-Smith Roderick
Neuenschwander Yoan
Nobis Gabriel
Oala Luis
Pomarico Enrico
Samek Wojciech
Sanguinetti Bruno
Willis Kurt
Publication venue
Publication date: 04/11/2022
Field of study

Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This makes it difficult to create physically faithful drift test cases or to provide specifications of data models that should be avoided when deploying a machine learning model. In this study, we demonstrate how these shortcomings can be overcome by pairing machine learning robustness validation with physical optics. We examine the role raw sensor data and differentiable data models can play in controlling performance risks related to image dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases. The experiments presented here show that the average decrease in model performance is ten to four times less severe than under post-hoc augmentation testing. Second, the gradient connection between task and data models allows for drift forensics that can be used to specify performance-sensitive data models which should be avoided during deployment of a machine learning model. Third, drift adjustment opens up the possibility for processing adjustments in the face of drift. This can lead to speed up and stabilization of classifier training at a margin of up to 20% in validation accuracy. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.Comment: LO and MA contributed equall

arXiv.org e-Print Archive

DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

Author: Alaa Ahmed
Aversa Marco
Chirica Mihaela
Hägele Miriam
Ivanova Daniela
Klauschen Frederick
Murray-Smith Roderick
Nobis Gabriel
Oala Luis
Ruff Lukas
Samek Wojciech
Sanguinetti Bruno
Standvoss Kai
Publication venue
Publication date: 23/06/2023
Field of study

We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is validated in a survey by ten experienced pathologists as well as a downstream segmentation task. Furthermore, the model scores strongly on anti-copying metrics which is beneficial for the protection of patient data

arXiv.org e-Print Archive

ML4H Auditing: From Paper to Practice

Author: Alejandro Muñoz Alvarado Erick
Balachandran Pradeep
Calderon-Ramirez Saul
Gilli Luca
Jaramillo-Gutierrez Giovanna
Kherif Ferath
Matek Christian
Nobis Gabriel
Oala Luis
Sanguinetti Bruno
Shroff Arun
Werneck Leite Alixandro
Wiegand Thomas
Xie Li Danny
Publication venue
Publication date: 11/12/2020
Field of study

Serveur académique lausannois

Bryophytes of Europe Traits (BET) dataset: a fundamental tool for ecological studies

Author: Albertos Belén
Bergamini Ariel
Bernhardt‐Römermann Markus
Bisang Irene
Calleja Alarcón Juan A.
Gabriel Rosalina
Garilleti R.
Hedenäs Lars
Hodgetts Nick
Lara Francisco
Nobis Michael P.
Preston Christopher
Simmel Josef
Urmi Edi
Van Zuijlen Kristel
Publication venue: 'Wiley'
Publication date: 01/01/2023
Field of study

Bryophytes are a diverse group of organisms with unique properties, yet they are severely underrepresented in plant trait databases. Building on the recently published European Red List of bryophytes and previous trait compilations, we present the Bryophytes of Europe Traits (BET) data set, including biological traits such as those related to life history, growth habit, sexual and vegetative reproduction; ecological traits such as indicator values, substrate and habitat; and bioclimatic variables based on the species' European range. The data set includes values for 65 traits and 25 bio-climatic variables, containing more than 135,000 trait values with a completeness of 82.7% on average. The data set will enable future studies in bryophyte biology, ecology and conservation, and may help to answer fundamental questions in bryology.info:eu-repo/semantics/publishedVersio

Repositório da Universidade dos Açores

Biblos-e Archivo

Data models for dataset drift controls in machine learning with optical images

Author: Aversa Marco
Buck Michèle
Clausen Christoph
Extermann Jérôme
Matek Christian
Murray-Smith Roderick
Neuenschwander Yoan
Nobis Gabriel
Oala Luis
Pomarico Enrico
Samek Wojciech
Sanguinetti Bruno
Willis Kurt
Publication venue: Transactions on Machine Learning Research
Publication date: 01/05/2023
Field of study

Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important public services spanning medicine or environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models. We demonstrate how such data models can be constructed for image data and used to control downstream machine learning model performance related to dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases to power model selection and targeted generalization. Second, the gradient connection between machine learning task model and data model allows advanced, precise tolerancing of task model sensitivity to changes in the data generation. These drift forensics can be used to precisely specify the acceptable data environments in which a task model may be run. Third, drift optimization opens up the possibility to create drifts that can help the task model learn better faster, effectively optimizing the data generating process itself to support the downstream machine vision task. This is an interesting upgrade to existing imaging pipelines which traditionally have been optimized to be consumed by human users but not machine learning models. The data models require access to raw sensor images as commonly processed at scale in industry domains such as microscopy, biomedicine, autonomous vehicles or remote sensing. Alongside the data model code we release two datasets to the public that we collected as part of this work. In total, the two datasets, Raw-Microscopy and Raw-Drone, comprise 1,488 scientifically calibrated reference raw sensor measurements, 8,928 raw intensity variations as well as 17,856 images processed through twelve data models with different configurations. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit

Enlighten

New national and regional bryophyte records, 49

Author: Agcagil E.
Akhoondi Darzikolaei S.
Aleffi M.
Araujo C. A. T.
Bakalin V. A.
Bednarek-Ochyra H.
Bojaca G. F. P.
Bruno Silva J.
Calleja J. A.
Cano M. J.
Castillo Diaz J.
Choi H. -G.
Claro D.
Cykowska-Marzencka B.
Dias dos Santos N.
Ellis L. T.
Enroth Johannes
Erzberger P.
Fantacelle L. B.
Gabriel R.
Garcia C. A.
Garilleti R.
Gigante D.
Hajek M.
Hedenäs L.
Heras P.
Infante M.
Kiebacher T.
Kim J. H.
Kirmaci M.
Koczur A.
Kovács A.
Krawczyk R.
Kucera J.
Lara F.
Lebouvier M.
Lüth M.
Maciel-Silva A. S.
Mazimpaka V.
Nagy J.
Nemeth Cs.
Nobis M.
Norhazrina N.
Nowak A.
Plasek V.
Poponessi S.
Rangel Germano S.
Schäfer-Verwimp A.
Sergio C.
Shirzadian S.
Stebel A.
Stryjak-Bogacka M.
Suleiman M.
Vanderpoorten Alain
Venanzoni R.
Vigalondo B.
Virchenko V. M.
Vončina G.
Węgrzyn M.
Wietrzyk P.
Yong K. -T.
Yoon Y. -J.
Publication venue
Publication date: 01/01/2016
Field of study

Peer reviewe

Archivio istituzionale della ricerca - Università di Camerino

Helsingin yliopiston digitaalinen arkisto

Jagiellonian Univeristy Repository

Data models for dataset drift controls in machine learning with optical images

Author: Aversa Marco
Buck Michèle
Clausen Christoph
Extermann Jérôme
Matek Christian
Murray-Smith Roderick
Neuenschwander Yoan
Nobis Gabriel
Oala Luis
Pomarico Enrico
Samek Wojciech
Sanguinetti Bruno
Willis Kurt
Publication venue
Publication date: 18/07/2023
Field of study

Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models. We demonstrate how such data models can be constructed for image data and used to control downstream machine learning model performance related to dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases to power model selection and targeted generalization. Second, the gradient connection between machine learning task model and data model allows advanced, precise tolerancing of task model sensitivity to changes in the data generation. These drift forensics can be used to precisely specify the acceptable data environments in which a task model may be run. Third, drift optimization opens up the possibility to create drifts that can help the task model learn better faster, effectively optimizing the data generating process itself. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

Author: Alaa Ahmed
Aversa Marco
Chirica Mihaela
Hägele Miriam
Ivanova Daniela
Klauschen Frederick
Murray-Smith Roderick
Nobis Gabriel
Oala Luis
Ruff Lukas
Samek Wojciech
Sanguinetti Bruno
Standvoss Kai
Publication venue
Publication date
Field of study

Enlighten

Characterizing Electron–Hole Plasma Dynamics at Different Points in Individual ZnO Rods

Author: Brian P. Mehl
Czekalla C.
Djurisic A. B.
Grundmann M.
House R. L.
House R. L.
James K. Parker
John M. Papanikolas
Johnson J. C.
Johnson J. C.
Joseph A. Puccio
Justin R. Kirschbrown
Kalt H.
Klingshirn C. F.
Leung Y. H.
Lin J. H.
Liu J. Z.
Mehl B. P.
Mehl B. P.
Michelle M. Gabriel
Nobis T.
Nobis T.
Ozgur U.
Pavesi L.
Ralph L. House
Song J. K.
Song J. K.
Sun L. X.
Szarko J. M.
Takeda J.
Takeda J.
van Vugt L. K.
Voss T.
Wang Z. L.
Yang P. D.
Zhang Y.
Zu P.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref